Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit aiohttp simultaneously opened connections to NVD #1093

Merged
merged 1 commit into from
Mar 11, 2021

Conversation

imsahil007
Copy link
Contributor

Pull request by @param211 #1091 can be an alternate solution for #1081.
But I manually tried limiting connections to NVD. And it seems like NVD allows a maximum limit of 20 per host.

connector = aiohttp.TCPConnector(limit_per_host=19)
            self.session = aiohttp.ClientSession(connector=connector, trust_env=True)

This worked for me every time.
I am just putting it here as a feasible solution @terriko
Also, the only files I changed is cvedb.py and test_cvedb.py (Other changes are related to isort)

@imsahil007
Copy link
Contributor Author

might fix #1081

@param211
Copy link
Contributor

param211 commented Mar 9, 2021

Hey @imsahil007!

  • I think whats happening here is that decreasing the number of concurrent connections inadvertently decreases the number of requests made / time too.

  • Part of the reason why the downloads worked fine on our local machines (for most of us) while failing in CI could be because the CI env had far higher download speeds. Thus reaching the rate limit. If there was a host limit, the downloads on our local machines would've failed too. (I might be wrong here)

  • My guess is that limit_per_host=19 constraints the requests just enough so that CI's download speeds aren't able to breach the rate limit. However, if someone was to use even higher download speeds, each request will close faster (while remaining under the limit_per_host) and we would still breach the rate limit.

Point being that limit_per_host in conjunction with a particular download speed might cause the requests to stay under the rate limit, but with a different/higher download speed the rate limit would be breached anyway.

I would note however, that this is all based on conjecture, so I might be wrong ;)

@imsahil007
Copy link
Contributor Author

Hey @imsahil007!

  • I think whats happening here is that decreasing the number of concurrent connections inadvertently decreases the number of requests made / time too.
  • Part of the reason why the downloads worked fine on our local machines (for most of us) while failing in CI could be because the CI env had far higher download speeds. Thus reaching the rate limit. If there was a host limit, the downloads on our local machines would've failed too. (I might be wrong here)
  • My guess is that limit_per_host=19 constraints the requests just enough so that CI's download speeds aren't able to breach the rate limit. However, if someone was to use even higher download speeds, each request will close faster (while remaining under the limit_per_host) and we would still breach the rate limit.

Point being that limit_per_host in conjunction with a particular download speed might cause the requests to stay under the rate limit, but with a different/higher download speed the rate limit would be breached anyway.

I would note however, that this is all based on conjecture, so I might be wrong ;)

Yes, I am decreasing the number of simultaneous connections that make separate requests. So, it will decrease the requests per unit time as the connections are lower than the default value of 50.
I don't know about other users but I wasn't able to download NVD data even a single time using previous code( even using semaphores). I tried running cve bin tool at least 20 times. So, the issue wasn't related much to internet download speed than it was to parallel connections to increase the scraping speed.

You can say that limit_per_host=19 limits the request to a sweet spot but what it actually does is handle simultaneous connections which may vary accordingly to how a site is deployed and how requests are restricted.
You can read more about it here https://docs.aiohttp.org/en/stable/client_reference.html
I am not saying this is a better solution to the problem. I am just putting it here as a feasible solution and nothing else

@terriko
Copy link
Contributor

terriko commented Mar 9, 2021

We're having a bit of a discussion about which of the two currently working solutions to #1081 is going to be most robust before I choose one. But in the meantime, do you mind splitting the isort fixes into a separate pull request? We can definitely get those merged so they don't muddy up anyone else's pull requests!

@terriko
Copy link
Contributor

terriko commented Mar 11, 2021

Because #1091 introduces a new MIT-licensed component, I've got some additional paperwork to do before I can merge it to make sure we have appropriate license compliance and follow best practices for making sure the license is clear.

I'm going to merge this fix for now just so CI isn't broken, but I believe that the rate limit proposed in #1091 is useful (this PR is a connection limit) so we'll be integrating them to play well together after I deal with the licensing work.

@terriko terriko merged commit b72e458 into intel:main Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants